在本文中,在模拟环境中对战斗无人机(UAV)进行了建模。旋转翼无人机成功执行了各种任务,例如锁定目标,跟踪并与周围车辆共享相关数据。采用了不同的软件技术,例如API通信,地面控制站配置,自主运动算法,计算机视觉和深度学习。
translated by 谷歌翻译
One of the main challenges in electroencephalogram (EEG) based brain-computer interface (BCI) systems is learning the subject/session invariant features to classify cognitive activities within an end-to-end discriminative setting. We propose a novel end-to-end machine learning pipeline, EEG-NeXt, which facilitates transfer learning by: i) aligning the EEG trials from different subjects in the Euclidean-space, ii) tailoring the techniques of deep learning for the scalograms of EEG signals to capture better frequency localization for low-frequency, longer-duration events, and iii) utilizing pretrained ConvNeXt (a modernized ResNet architecture which supersedes state-of-the-art (SOTA) image classification models) as the backbone network via adaptive finetuning. On publicly available datasets (Physionet Sleep Cassette and BNCI2014001) we benchmark our method against SOTA via cross-subject validation and demonstrate improved accuracy in cognitive activity classification along with better generalizability across cohorts.
translated by 谷歌翻译
Development of guidance, navigation and control frameworks/algorithms for swarms attracted significant attention in recent years. That being said, algorithms for planning swarm allocations/trajectories for engaging with enemy swarms is largely an understudied problem. Although small-scale scenarios can be addressed with tools from differential game theory, existing approaches fail to scale for large-scale multi-agent pursuit evasion (PE) scenarios. In this work, we propose a reinforcement learning (RL) based framework to decompose to large-scale swarm engagement problems into a number of independent multi-agent pursuit-evasion games. We simulate a variety of multi-agent PE scenarios, where finite time capture is guaranteed under certain conditions. The calculated PE statistics are provided as a reward signal to the high level allocation layer, which uses an RL algorithm to allocate controlled swarm units to eliminate enemy swarm units with maximum efficiency. We verify our approach in large-scale swarm-to-swarm engagement simulations.
translated by 谷歌翻译
当前文献中可用的卷积神经网络(CNN)方法旨在主要与低分辨率图像合作。当应用于非常大的图像时,与GPU记忆相关的挑战,比语义通信所需的较小的接受场以及需要结合多尺度特征的需求。但是,可以减少输入图像的分辨率,但要大量关键信息丢失。基于概述的问题,我们引入了一个新的研究问题,以培训CNN模型为非常大的图像,并介绍“超级数据集”,这是一个简单而代表性的基准数据集,用于此任务。 Ultramnist是使用流行的MNIST数字设计的,并添加了更多的复杂性,以很好地复制现实世界问题的挑战。我们提出了两个问题的两个变体:“超级分类”和“预算意识到的超级名人分类”。标准的超快分类基准旨在促进新型CNN培训方法的开发,从而有效利用最佳可用GPU资源。预算感知的变体旨在促进在受限GPU记忆下工作的方法的开发。为了开发竞争解决方案,我们为标准基准及其预算感知变体提供了几种基线模型。我们研究了减少分辨率对涉及流行最先进模型中预审预定型骨架的基线模型的性能的影响和目前的结果。最后,借助提出的基准数据集和基线,我们希望为新一代的CNN方法铺平地面,适合以有效和资源的方式处理大型图像。
translated by 谷歌翻译
本文介绍了一个新的多模式介入放射学数据集,称为POCAP(端口导管放置)语料库。该语料库由德语,X射线图像的语音和音频信号组成,以及六名外科医生从31个POCAP干预措施收集的系统命令,平均持续时间为81.4 $ \ pm $ 41.0分钟。该语料库旨在为在手术室中开发智能语音助理提供资源。特别是,它可用于开发语音控制的系统,该系统使外科医生能够控制操作参数,例如C臂运动和表位置。为了记录数据集,我们获得了Erlangen大学医院和患者数据隐私的机构审查委员会和工人委员会的同意。我们描述了录制设置,数据结构,工作流程和预处理步骤,并使用预告片的模型以11.52 $ \%$单词错误率报告了第一个POCAP语料库语音识别分析结果。研究结果表明,数据有可能构建强大的命令识别系统,并将使用医学领域中的语音和图像处理来开发新颖的干预支持系统。
translated by 谷歌翻译
由于多模式遥感(RS)图像档案的可用性,最重要的研究主题之一是开发跨模式RS图像检索(CM-RSIR)方法,该方法可以在不同模态上搜索语义上相似的图像。现有的CM-RSIR方法需要提供高质量和数量的带注释的培训图像。在操作方案中,收集足够数量的可靠标记图像是耗时,复杂且昂贵的,并且可能会显着影响CM-RSIR的最终准确性。在本文中,我们介绍了一种新颖的自我监督的CM-RSIR方法,其目的是:i)以自我监督的方式模拟不同方式之间的相互信息; ii)保留彼此相似的模态特异性特征空间的分布; iii)在每种模式中定义最相似的图像,而无需任何带注释的训练图像。为此,我们提出了一个新的目标,其中包括同时同时使用的三个损失函数:i)最大化不同模态的共同信息以保存模式间相似性; ii)最小化多模式图像元素的角度距离,以消除模式间差异; iii)增加每种模式中最相似图像的余弦相似性,以表征模式内相似性。实验结果表明,与最新方法相比,该方法的有效性。该方法的代码可在https://git.tu-berlin.de/rsim/ss-cm-rsir上公开获得。
translated by 谷歌翻译
由于新型模型利用较大的数据集和新颖架构,通过生成模型创建的合成图像提高了质量和表现力。尽管这种质感主义是来自创意的角度的正副作用,但是当这种生成模型用于无同意时的冒充时,它会出现问题。这些方法中的大多数是基于源和目标对之间的部分传输,或者它们基于理想的分布生成完全新的样本,仍然类似于数据集中最接近的真实样本。我们提出Mixsyn(阅读为“Mixin”),用于从多种来源学习新的模糊组合物并将新颖的图像作为与组合物对应的图像区域的混合。 Mixsyn不仅将来自多个源掩码的不相关的区域与相干语义组成相结合,而且还生成了非现有图像的掩模感知的高质量重建。我们将Mixsyn与最先进的单源顺序发电和拼贴生成方法相比,在质量,多样性,现实主义和表现力方面;同时还展示了交互式合成,混合和匹配,以及编辑传播任务,没有掩码依赖性。
translated by 谷歌翻译
Variational inference uses optimization, rather than integration, to approximate the marginal likelihood, and thereby the posterior, in a Bayesian model. Thanks to advances in computational scalability made in the last decade, variational inference is now the preferred choice for many high-dimensional models and large datasets. This tutorial introduces variational inference from the parametric perspective that dominates these recent developments, in contrast to the mean-field perspective commonly found in other introductory texts.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译